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Abstract 

This paper concerns the problem of matrix completion, which is to estimate a 
matrix from observations in a small subset of indices. We propose a calibrated 
spectrum elastic net method with a sum of the nuclear and Frobenius penalties and 
develop an iterative algorithm to solve the convex minimization problem. The iter- 
ative algorithm alternates between imputing the missing entries in the incomplete 
matrix by the current guess and estimating the matrix by a scaled soft-thresholding 
singular value decomposition of the imputed matrix until the resulting matrix con- 
verges. A calibration step follows to correct the bias caused by the Frobenius 
penalty. Under proper coherence conditions and for suitable penalties levels, we 
prove that the proposed estimator achieves an error bound of nearly optimal order 
and in proportion to the noise level. This provides a unified analysis of the noisy 
and noiseless matrix completion problems. Simulation results are presented to 
compare our proposal with previous ones. 



1 Introduction 

Let 6 e ]R''i^'^2 ^ matrix of interest and Q,* — {1, . . . ,di} x {1, . . . , 1^2}. Suppose we observe 
vectors {uJi,yi), 

Vi = Qi^i + £i, i = l,...,n, (1) 

where uji € ft* and Si are random errors. We are interested in estimating 8 when n is a small 
fraction of did2. A well-known application of matrix completion is the Netflix problem where yi 
is the rating of movie bj by user for oj — {ai,bj) G fi* fT). In such applications, the proportion 
of the observed entries is typically very small, so that the estimation or recovery of 8 is impossible 
without a structure assumption on 8. In this paper, we assume that 8 is of low rank. 

A focus of recent studies of matrix completion has been on a simpler formulation, also known 
as exact recovery, where the observations are assumed to be uncorrupted, i.e. e,; = 0. A direct 
approach is to minimize rank(i\/) subject to M^. = y,;. An iterative algorithm was proposed in Q 
to project a trimmed SVD of the incomplete data matrix to the space of matrices of a fixed rank 
r. The nuclear norm was proposed as a surrogate for the rank, leading to the following convex 
minimization problem in a linear space |2|: 

e^^'') = argmin ||1M||(W) : M^, = Vz < n|. 

We denote the nuclear norm by || • ||(jv) here and throughout this paper. This procedure, analyzed 
in 121 E] m [TTIl among others, is parallel to the replacement of the £0 penalty by the £1 penalty in 
solving the sparse recovery problem in a linear space. 
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In this paper, we focus on the problem of matrix completion with noisy observations ([T]i and take the 
exact recovery as a special case. Since the exact constraint is no longer appropriate in the presence 
of noise, penalized squared error ~ Vi)'^ considered. By reformulating the problem in 

Lagrange form, [8] proposed the spectrum Lasso 

71 n 

e(^«T) ^ ^^^^^-^ I J2 Mlj2 - J2 y^M^. + A||M||(jv)}, (2) 

i—l i—1 

along with an iterative convex minimization algorithm. However, Q is difficult to analyze when the 
sample fraction ttq = nj {d\d-2) is small, due to the ill-posedness of the quadratic term 'Y^=\ ■ 
This has led to two alternatives in [TJand [9J. While [9J proposed to minimize (j2]i under an additional 
loo constraint on M, Q modified (pb by replacing the quadratic term X^ILi ^'^ti ^i'^^ 7ro|| M||^^j. 
Both [7,9] provided nearly optimalerror bounds when the noise level is of no smaller order than 
the loo norm of the target matrix 0, but not of smaller order, especially not for exact recovery. In 
a different approach, [6| proposed a non-convex recursive algorithm and provided error bounds in 
proportion to the noise level. However, the procedure requires the knowledge of the rank r of the 
unknown and the error bound is optimal only when di and di are of the same order. 

Our goal is to develop an algorithm for matrix completion that can be as easily computed as the 
spectrum Lasso (|2]i and enjoys a nearly optimal error bound proportional to the noise level to con- 
tinuously cover both the noisy and noiseless cases. We propose to use an elastic penalty, a linear 
combination of the nuclear and Frobenius norms, which leads to the estimator 

n n 

e = argmin { ^ M^J2 - ^ y,M„, + Ai||Af||(^) + (A2/2)|lAf H^^)}, (3) 

i — 1 2 — 1 

where || • |l(jv) and || • are the nuclear and Frobenius norms, respectively. We call (jsjl spectrum 
elastic net (E-net) since it is parallel to the E-net in linear regression, the least squares estimator 
with a sum of the £i and £2 penalties, introduced in ifTSl . Here the nuclear penalty provides the 
sparsity in the spectrum, while the Frobenius penalty regularizes the inversion of the quadratic term. 
Meanwhile, since the Frobenius penalty roughly shrinks the estimator by a factor tto/{ttq + A2), we 
correct this bias by a calibration step, 

§ = (1 + A2 Ao)e. (4) 

We call this estimator calibrated spectrum E-net. 

Motivated by fSl, we develop an EM algorithm to solve ^ for matrix completion. The algorithm 
iteratively replaces the missing entries with those obtained from a scaled soft-thresholding singular 
value decomposition (SVD) until the resulting matrix converges. This EM algorithm is guaranteed 
to converge to the solution of (|3]l. 

Under proper coherence conditions, we prove that for suitable penalty levels Ai and A2, the cali- 
brated spectrum E-net (4]) achieves a desired error bound in the Frobenius norm. Our error bound 
is of nearly optimal order and in proportion to the noise level. This provides a sharper result than 
those of |7 9 1 when the noise level is of smaller order than the £00 norm of O, and than that of [6| 
when d2/di is large. Our simulation results support the use of the calibrated spectrum E-net. They 
illustrate that Q performs comparably to Q and outperforms the modified method of |2l ■ 

Our analysis of the calibrated spectrum E-net uses an inequality similar to a duel certificate bound 
in O. The bound in 13] requires sample size n x mm{{r log d)^ ,r {log d)^}d log d, where d = 
di + d2- We use the method of moments to remove a logd factor in the first component of their 
sample size requirement. This leads to a sample size requirement of n x r^dlogd, with an extra r 
in comparison to the ideal n x rdlogd. Since the extra r does not appear in our error bound, its 
appearance in the sample size requirement seems to be a technicality. 

The rest of the paper is organized as follows. In Section 2, we describe an iterative algorithm for the 
computation of the spectrum E-net and study its convergence. In Section 3, we derive error bounds 
for the calibrated spectrum E-net. Some simulation results are presented in Section 4. Section 5 
provides the proof of our main result. 

We use the following notation throughout this paper For matrices M e M'^i^''^, ||Af ||(jv) is the 
nuclear norm (the sum of all singular values of M), 11^/11(5) is the spectrum norm (the largest 
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singular value), ||M||(^') is the Frobenius norm (the I2 norm of vectorized M), and ||A/||oc = 
maxjfc Linear mappings 

fromM'^i^'^^ to j^dixds 

are denoted by the calligraphic letters. For 
a linear mapping Q, the operator norm is ||Q||(op) ~ supp^n^^^^j^ ||Q-/V/||(^). We equip W^^^'^^ 
with the inner product {Mi.Nh) = Xi?Lce,{Mj Ah) so that {M,M) = ||M||(^^. For projections 

V, — X ~ V with I being the identity. We denote by the unit matrix with 1 at a; e 
{1, . . . , di} X {1, . . . , ^2}, and by V^, the projection to E^: M M^E^ = {E^,M)E^. 



2 An algorithm for spectrum elastic regularization 



We first present a lemma for the M-step of our iterative algorithm. 

Lemma 1 Suppose the matrix Z has rank r. The solution to the optimization problem 

argmin - W\\'lp)/2 + Xi\\Z\\(i,) + A2||Z||2^)/2} 

isgivenbyS{W] Ai, A2) = UDx^^^' with Dx^m = diag{{di-\i)+, . . . , (d,.-Ai) + }/(l + A2), 
where UDV is the SVD ofW, D — diag{di, . . . , dr} andt-^- — max{t, 0). 

The minimization problem in Lemma[T|is solved by a scaled soft-thresholding SVD. This is parallel 
to Lemma 1 in |8 1 and justified by Remark 1 there. We use Lemma[T]to solve the M-step of the EM 
algorithm for the spectrum E-net ([3]). 

We still need an E-step to impute a complete matrix given the observed data {yi,uJi : i — 1, . . . , n}. 
Since are allowed to have ties, we need the following notation. Let = #{z : Ui ^ uj,i < n} 
be the multiplicity of observations at lu E ft* and m* = maxi^ m^^ be the maximum multiplicity. 
Suppose that the complete data is composed of observations at each oj for a certain integer . 

Let 1^^'^°"'' be the sample mean of the complete data at uj and y'"^"™' be the matrix with components 

^(com) complete data are available, ^ is equivalent to 



arg mm 

M 



{(m,/2)||y<^°""-M||2^)+Ai||M||(^.) + (A2/2)||M||^^)}. 



Let F^"*"^^ — "m-^^ J2uj =u:yi sample mean of the observations at uj and y*^"*"*^ — 

(^i°'"'^)dixd2- '^he white noise model, the conditional expectation of 1^^'^°™'' given y''°^^^ is 
(m^ / m^)Y^°^^'' + (1 — m^/m^,)Q^ for < m*. This leads to a generalized E-step: 

Y ={Y^ )dixd2,Y^ = mm{l, (m^/m*)}y^ + (1 - TO„/77i*)+Zi (5) 

where Z'^"^'^^ is the estimation of Q in the previous iteration. This is a genuine E-step when = m* 
but also allows a smaller to reduce the proportion of missing data. 

We now present the EM-algorithm for the computation of the spectrum E-net 6 in ([3}. 
Algorithm 1 Initialize with Z^^^ and k — 0. Repeat the following steps: 

• E-step: Compute y '""''^ (jjj) with Z^°^'^^ = Z^^'^ and assign fc -s— A; + 1, 

• M-step: Compute Z^^^ ^ S'(F^™''^ Ai/m*, A2/TO*), 
until IIZ^*^) - < e. Then, return Z^^\ 
The following theorem states the convergence of Algorithm[T] 

Theorem 1 Ai fc — > co, Z^'^' converges to a limit as a function of the data and (Ai, A2, m*), 
andZ<-°°^ = e fornix > m* . 
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Theorem[T]is a variation of a parallel result in |8| and follows from the same proof there. As IH 
pointed out, a main advantage of Algorithm [T] is the speed of each iteration. When the maximum 

multiplicity m* is small, we simply use = F'°'^^ and = m* ; Otherwise, we may first run 
the EM-algorithm for an < m* and use the output as the initialization Z*^*^) for a second run of 
the EM-algorithm with = m* . 

3 Analysis of estimation accuracy 

In this section, we derive error bounds for the calibrated spectrum E-net. We need the following 
notation. Let r ~ rank(9), UDV^ be the SVD of <d, and si > . . . > s^ht the nonzero singular 
values of 8. Let T be the tangent space with respect to UV^ , the space of aU matrices of the form 
UU^ Ml + M2VV^ . The orthogonal projection to T is given by 

VtM = UU^ M + MVV^ - UU^ MVV^ . (6) 

Theorem 2 Let ( = I + \2/t^o and % = ^^i- Define 

n = (H-^o)^t/K + A2), 

A = n{X2e + XiUV^), 

Q = I-H{VtHVt + X2Vt)~^Vt. 
Let e = X)"=i ^i^uii- Suppose 

||PT7^|| (op) < 1/2, Sr>5Ai/A2, (7) 

WrrMiF) < \\A-n{VTn + rTy^rTA\\(^s^ < Ai/4, (S) 

II^Tell(f) < V^Xi/S, ||Q£||(s) < 3Ai/4, ||7'^e||(s) < Ai. (9) 
Then the calibrate spectrum E-net Q satisfies 

||e-e||(i.) <2^/fAiAo. (10) 

The proof of Theorem |2] is provided in Section 5. When are random entries in il* , EH = ttoX, 
so that (j8]l and the first inequality of (j7]i are expected to hold under proper conditions. Since the 
rank of Vt£ is no greater than 2r, (j9]l essentially requires ||e||(5) x Ai. Our analysis allows A2 to 
lie in a certain range [A*, A*], and A* /A* is large under proper conditions. Still, the choice of A2 is 
constrained by (j?} and (jsjl since A is linear in A2. When \2/t^o diverges to infinity, the calibrated 
spectrum E-net Q becomes the modified spectrum Lasso of Q. 

Theorem |2] provides sufficient conditions on the target matrix and the noise for achieving a cer- 
tain level of estimation error Intuitively, these conditions on the target matrix Q must imply a 
certain level of coherence (or flatness) of the unknown matrix since it is impossible to distinguish 
the unknown from zero when the observations are completely outside its support. In l2[[3ll4i rm . 
coherence conditions are imposed on 

/io =max{(di/r)||C/;7^||oo,(d2/r)||l/y^|U}, ^il ^ y/d^d^\\UV^ \\^, (11) 

where U and V are matrices of singular vectors of Q. |9| considered a more general notation of 
spikiness of a matrix M, defined as the ratio between the iao and dimension-normalized £2 norms. 



asp{M) = ||M||oov/^/P^II(F)- (12) 

Suppose in the rest of the section that oji are iid points uniformly distributed in 57* and Si are iid 
(0, cr^) variables independent of {oJi}. The following theorem asserts that under certain coherence 
conditions on the matrices e, UU^, VV^ and UV^ , all conditions of Theorem!] hold with large 
probability when the sample size n is of the order r^d log d. 

Theorem 3 Let d — di + d2. Consider Ai and A2 satisfying 

Al =av/87rodlogd, 1< , r /mi .ui/4 (13) 
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Then, there exists a constant C such that 

n > Cmaxj^gr^dlogd, (^i + r)fiird\ogd, (a^/^ V Kl)r^d\ogdj (14) 

implies 

11© - e|l^^)/(did2) < 32{a'^rdlogd)/n 



with probability at least 1 — 1/d , where /io and /ii are the coherence constants in agp = 
asp{0)isthespikinessofQandK^, = ||0||(^)/(r^/^Sr)- 

We require the knowledge of noise level a to determine the penalty level that is usually con 



sidered as tuning parameter in practice. The Frobenius norm ||0||(f) in •13i can be replaced 
by an estimate of the same magnitude in Theorem [3] In our simulation experiment, we use 
As = Ai{n/(d log with F (E"=i Z/zV^ro)^^^- The Chebyshev inequality provides 

F/||e||(F) ^ 1 when a,p = 0(1) and « ||e||^. 

A key element in our analysis is to find a probabilistic bound for the second inequality of ([8]l, or 
equivalently an upper bound for 

p{\\n{VTn + VTr\X2e + Xiuv^)\\(^s) > Ai/4}. (15) 

This guarantees the existence of a primal dual certificate for the spectrum E-net penalty |14|. 
For A2 = 0, a similar inequality was proved in [3|, where the sample size requirement is 
n > Co nim{fi^ {log d)^ d, fi^r {log d)^d} for a certain coherence factor fi. We remove a log 
factor in the first bound, resulting in the sample size requirement in ( [T4| , which is optimal when 
r — 0(1). For exact recovery in the noiseless case, the sample size n x ^^(logd)^ is sufficient if 
a golfing scheme is used to construct an approximate dual certificate IIKTTI. We use the following 
lemma to bound ( fTS) . 

Lemma 2 Let TL — 'Y^^i ^here uji are iid points uniformly distributed in ft*. Let TZ = 
{% — ttq)'Pt / (t'o + A2) and ^ = 1 + As/ttq. Let M be a deterministic matrix. Then, there exists a 
numerical constant C such that, for all k > 1 and m > 1, 

{\ km / \ 2m 

Cf,ydkm/nj [ti^Wd;d^/r)\\M\\^j . (16) 

We use a different graphical approach than those in [31 to bound E Xiace{{{TZ'' M)^ {TZ'' M)}™) in 
the proof of Lemma [2] The rest of the proof of Theorem [3] can be outlined as follows. Assume 
that all coherence factors are 0(1). Let M = X2Q + XiUV'^ and write TZCPt'R- + VtT'^M 



nM-n'^M+- ■ ■+{-!)''' -^n'''' M+Rem. By(16lwithfcm x logdforfc > 2 and an even simpler 



bound for fc = 1 and Rem, (15i holds when {y/did2/r)\\M\\aci x Ai?/, where rj x r^d{\ogd)/n 



Since Usp + /ii + ||B||^^j/(7^s^ = 0(1), this is equivalent to ?/(srA2/Ai + 1) ^ 1. Finally, we use 
matrix exponential inequalitieslflOl fT2l to verify other conditions of Theorem[2] We omit technical 



details of the proof of Lemma|2|and Theorem[3j We would like to point out that if the r in ( 16 1 can 
be replaced by r (log d)'', e.g. 7 = 5 in view of f3], the rest of the proof of Theorem[3]is intact with 



7/ X rd{log dy^'' /n and a proper adjustment of A2 in ( 13 1 



Compared with ||7l and O, the main advantage of Theorem [3] is the proportionality of its error 
bound to the noise level. In fl\, the quadratic term X^ILi j^j is replaced by its expectation 

ttqII A/||^^^ and the resulting minimizer is proved to satisfy 

lie(^T) _ e||2^^/(d^rf2) < Cnmx{a\ \\e\\l,)rd{logd) /n (17) 

with large probability, where O is a numerical constant. This error bound achieves the squared error 
rate a^rd{\ogd)/n as in Theorem p] w hen the noise level cr is of no smaller order than ||0||oo, but 
not of smaller order In particular, (17 1 does not imply exact recovery when ct = 0. In Theorem 3] 



the error bound converges to zero as the noise level diminishes, implying exact recovery in the 
noiseless case. In jO), a constrained spectrum Lasso was proposed that minimizes Q subject to 

||M||oo < a*/^/did2. For \\&\\(^f) < 1 and asp{Q) < a*, 19J proved 

||0(NW) _Q||2^^ < Cmax(dld2CT^l)(a*)V(^(logd)/7^ (18) 
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with large probability. Scale change from the above error bound yields 

||0(NW) _ e||^^)/(did2) < Cmax{(72, \\Q\\lp.^/{did2)}{a*frd{\ogd)ln. 
Since a* > 1 and a* 11 01 



(F) 



I\fd\d2 > ||0||oo, the right-hand side of 



18 I is of no smaller order 

than that of (17 1. We shall point out that (17 1 and ( 18 1 only require sarnple size n x rd log d. In 
addition, ||9| allows more practical weighted sampling models. 

Compared with f6l, the main advantage of Theorem[3]is the independence of its sample size require- 
ment on the aspect ratio d2/di, where d2 > di is assumed without loss of generality by symmetry. 
The error bound in (|6| impUes 



||g(KMO) _ e||2^^/(rf^d2) < Co{si/sr)'a\d{\ogd)/n 



(19) 



for sample size n > C^rd log d ^ 



J2r^dy^d2/di, where {C^, C2} are constants depending on the 
same set of coherence factors as in ( 14i and si > ■ ■ ■ > Sr are the singular values of Q. Therefore, 
Theorem [3] effectively replaces the root aspect ratio ^J^d^Jdx in the sample size requirement of ( 19 1 
with a log factor, and removes the coherence factor (si/s^)^ on the right-hand side of (19i. We 
note that si/s^ is a larger coherence factor than ||0|| (•j^)/(r^/'^Sr) in the sample size requirement in 
Theorem [3] The root aspect ratio can be removed from the sample size requirement for ( 19 1 if 
can be divided into square blocks uniformly satisfying the coherence conditions. 



4 Simulation study 

This experiment has the same setting as in Section 9 of fSl. We provide the description of the 
simulation settings in our notation as follows: The target matrix is = UV^ , where Ud^xr and 
Vrf^xr are random matrices with independent standard normal entries. The sampling points have 
no tie and £7 — {uji : i — 1, . . . , 71} is a uniformly distributed random subset of {1, . . . ,di} x 
{1, . . . , d2}, where n is fixed. The errors e are iid A^(0, cr^) variables. Thus, the observed matrix is 
Y = 7^0(0 + s) with Vq = H = J27=i '^^i bsing a projection. The signal to noise ratio (SNR) is 
defined as SNR = V^/cr. 

We compare the calibrated spectrum E-net (j4]) with the spectrum Lasso (j2]i and its modification 
@(KLT) q£ |j2j Pqj. ^jj methods, we compute a series of estimators with 100 different penalty lev- 
els, where the smallest penalty level corresponds to a full-rank solution and the largest penalty 
level corresponds to a zero solution. For the calibrated spectrum E-net, we always use A2 = 
Ai{n/(dlogd)}i/4/F, where F = {YJLiV'i I'^of^ is an estimator for ||e||(j.). We plot the 
training errors and test errors as functions of estimated ranks, where the training and test errors are 
defined as 

ro(Q-nil^) lini(e-e)||^^) 

Trammg error = — , Test error = — ; — — . 

In Figure 1, we report the estimation performance of three methods. The rank of Q is 10 but 
{0,r2,£} are regenerated in each replication. Different noise levels and proportions of the ob- 
served entries are considered. All the results are averaged over 50 replications. In this experiment, 
the calibrated spectrum E-net and the spectrum Lasso estimator have very close testing and training 
errors, and both of them significantly outperform the modified Lasso. Figure 1 also illustrates that 
in most cases, the calibrated spectrum E-net and spectrum Lasso achieve the optimal test error when 
the estimated rank is around the true rank. 

We note that the constrained spectrum Lasso estimator O*^^*^ would have the same performance as 
the spectrum Lasso when the constraint asp{Q) < a* is set with a sufficiently high a*. However, 
analytic properties of the spectrum Lasso is unclear without constraint or modification. 

5 Proof of Theorem m 

The proof of Theorem [2] requires the following proposition that controls the approximation error of 
the Taylor expansion of the nuclear norm with subdifferentiation. The result, closely related to those 
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71 =0.2, SNR=1 'tn=0-2- SNR=10 




Rank Rank 



Figure 1: Plots of training and testing errors against the estimated rank: testing error with solid lines; 
training error with dashed lines; spectrum Lasso in blue, calibrated spectrum E-net in red; modified 
spectrum Lasso in black; di — d2 = 100, rank(9) = 10. 

in lfT3l . is used to control the variation of the tangent space of the spectrum E-net estimator We omit 
its proof. 

Proposition 1 Let Q ~ U DV^ be the SVD and M be another matrix. Then, 

< \\M\\^N)-\\Q\\^N)-\\V^M\\(j,)-{UV^,M-Q) 

< iuvtM - e)VD-'/YiF} + \\d-^'^u^{VtM - Q)\\lpy 

Proof of Theorem |2j Define 

e* = {VT'HVT + ^2VT)~^{VTe + VT'HQ-\iUV"^), 

e- K + A2)-^^oe-AiC/i/^), 

A = e-e*, A* = e*-e, a, = e-e. 

Since 9 = and ^6 - 9 = -{\i/tto)UV^ , 

||9-9||(;^) < e||A,||(;.) + ||C9-9|l(^^) 

= ei|A*||(F)+V^Ai/7ro (20) 
< el|A||(F)+e||A*||(^^)+^/fAl/^o. (21) 
We consider two cases by comparing A2 and ttq. 

Case 1: A2 < ttq. By algebra ^A* = ti^'^{Vt'R- + VTy^Vrie + A), so that 

aA*\\^p^<7r^'\\{VTn + VT)-'\\iop)\\VTA + VTe\\iF^ < ^^ Ai/(27ro). (22) 

The last inequality above follows from the first inequalities in (j7]l, ([8]l and (j9]l. It remains to bound 
II A||(^). Let Y ~ Vi^uii- We write the spectrum E-net estimator (jsjl as 

9 = argmin{(HM,M>/2- (y,Af> + Al||^f||(^^) + (A2/2)||Af H^^^}. 
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It follows that for a certain member G in the sub-differential of ||Af||(^) at M = G, 

= 9iAi,A2 (O) = He -Y + AzG + AiG = {H + A2)A + {H + A2)e* - F + AiG. 
LetRemi = ||e*||(Ar) - {UV^,e*). Since ||e*||(jv) - ||0||(Ar) > -(A, G), we have 

((H + A2)A,A) < (He + £-(H + A2)e*,A) + Ai||e*||(jv) -Aiiieii(Ar) 

= (^(6 -e*)+e- Ase*, A) + AiRemi + X^iUV^ ,0*) ~ Ai||e||(Ar) 
< AiRemi + (e + ^(6 - 8*) - A2e* - AiL/y^, A) - \^\\V^lS\\J^^ 
= AiRemi + (e + -H(e-e*),7'3^A> -Aill-p^tAlljAT). (23) 



The second inequality in (23 1 is due to ||e||(Ar) > \\VtQ\\[n) + {UV^ ,&) andT'^© = T'^A. The 
last equahty in (231) follows from the definition of 9* G T, since it gives Vt^ + 7't'H(9 - 9*) - 
A29* - X^UV^ -{VtHVt + X2Vt)Q* + Vt£ + VrUe - AiC/F^ = 0. By the definitions 
of Q, 9* and A, e + ^(9 - 9*) = Qe + ^(9 - 9) - UiVrKPr +_A2Pt)"^PtA. Since 
V^UVt = V^{n - 7:o)Vt = V^niiTo + A2) and {n - 7ro)(9 - 9) = A, we find 

(£ + -H(9- 9*),-p^ZV _ 
= {Qe+(Ji-Tro){e-e-{VTHVT + >^2VTy^VTA},V^A) 
= {Qe + A-n{VTn + VTT^VTA,V^A). 

Thus, by the second inequalities of (j8]l and (j9|l, 

(e + H(9-9*),7'^A) < Ai||7'^A||(^r). (24) 

Since 9* = A* — 9 e T and the singular values of 9 is no smaller than (ttos^ — Ai)/(7ro + A2) > 
[sr — Ai/A2)/^ > 4Ai/(A2^) by the second inequality in (j7]l. Proposition [T| and (22i imply 

Remi < 2||9* - 9||^^)/{Ksr - Ai)/(7ro + A2)} < r(Ai/7ro) V(8eAi/A2). (25) 

It follows from (|23]), (|24| and (|25]l that 

eWiF) < + A2)A,A)/A2 < C'(Ai/A2)Remi < rA2/(4^2)^ ^^6) 

Therefore, the error bound ( fTO] ) follows from ( pT) , ( p2j i and ( 



Caie 2; A2 > ttq. By applying the derivation of (23 1 to 9 instead of 9*, we find 

< Ai(||9||(Ar) -(i7y^,9)) + (£ + 7{(9-9)-A29-AiC/V^^,A,). 

By the definitions of A, 7^, and 9, A = ("H - 7ro)(9 - 9) = ^(9 - 9) - A29 - XiUV^ . This 
and ||9||(Ar) = {UV^,Q) gives 

((H + A2)A,,A,) +Ai||P^A,||(A,) < (£ + A,A,). (27) 

Since \\Vj^{e + A)||(s) = II^T^Iks) ^ Ai by the third inequality in (j9|, we have 

(7'^(£ + A),A,) < Ai||7'^A,||(jv). (28) 
It follows from ( [ZT] ), ( [28] l and the first inequahties of ([8|l and ^ that 

A2||A,||2^) < (7't(£ + A),A,) < {||7't£|1(f) + rTA||(^.)}||A,|l(j.) < V^X,\\A4^p^/2. 
Thus, due to A2 > ttq, 

^11 A, II (f.) < (e/A2)VfAi/2 < yfAi/TTo. (29) 
Therefore, the error bound ( fTO] ) follows from ( pO) i and (|29]l. □ 

Acknowledgments 

This research is partially supported by the NSF Grants DMS 0906420, DMS- 11-06753 and DMS- 
12-09014, and NSA Grant H98230-1 1-1-0205. 



8 



References 

[1] ACM SIGKDD and Netflix. Proceedings of KDD Cup and workshop. 2007. 

[2] E. Candes and B. Recht. Exact matrix completion via convex optimization. Found. Comput. 

Math., 9:717-772, 2009. 

[3 J E. J. Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion. 
IEEE Trans. Inform. Theory, 56(5):2053-2080, 2009. 

[4] D. Gross. Recovering low-rank matrices from few coefficients in any basis. CoRR, 
abs/0910.1879, 2009. 

[5] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE 

Transactions on Information Theory, 56(6):2980-2998, 2010. 

[6] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. Journal of 
Machine Learning Research, 11:2057-2078, 2010. 

[7] V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear-norm penalization and optimal rates 

for noisy low-rank matrix completion. The Annals of Statistics, 39:2302-2329, 201 1. 

[8] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning 
large incomplete matrices. Journal of Machine Learning Research, 11:2287-2322, 2010. 

[9] S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix comple- 
tion: Optimal bounds with noise. 2010. 

[10] R. I. Oliveira. Concentration of the adjacency matrix and of the laplacian in random graphs 
with independent edges. Technical Report arXiv:09 11.0600, arXiv, 2010. 

[11] B. Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 
12:3413-3430, 2011. 

[12] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 
doi:10.1007/sI0208-011-9099-z., 20n. 

[13] R-A. Wedin. Perturbation bounds in cormection with singular value decomposition. BIT, 
12:99-111, 1972. 

[14] C.-H. Zhang and T. Zhang. A general framework of dual certificate analysis for structured 
sparse recovery problems. Technical report, arXiv: 1201. 3302vl, 2012. 

[15] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. R. Statist. 
Soc. B, 67:301-320, 2005. 



9 



